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Abstract 


The last few years have seen an explosion of academic and popular interest in algorithmic 
fairness. Despite this interest and the volume and velocity of work that has been produced 
recently, the fundamental science of fairness in machine learning is still in a nascent state. In 
March 2018, we convened a group of experts as part of a CCC visioning workshop to assess the 
state of the field, and distill the most promising research directions going forward. This report 
summarizes the findings of that workshop. Along the way, it surveys recent theoretical work in 
the field and points towards promising directions for research. 


1 Introduction 


The last decade has seen a vast increase both in the diversity of applications to which machine 
learning is applied, and to the import of those applications. Machine learning is no longer just 
the engine behind ad placements and spam filters: it is now used to filter loan applicants, deploy 
police officers, and inform bail and parole decisions, amongst other things. The result has been a 
major concern for the potential for data driven methods to introduce and perpetuate discriminatory 
practices, and to otherwise be unfair. And this concern has not been without reason: a steady 
stream of empirical findings has shown that data driven methods can unintentionally both encode 
existing human biases and introduce new ones (see e.g. for notable 
examples). 

At the same time, the last two years have seen an unprecedented explosion in interest from 
the academic community in studying fairness and machine learning. “Fairness and transparency” 
transformed from a niche topic with a trickle of papers produced every year (at least since the work 
of [PRT08]) to a major subfield of machine learning, complete with a dedicated archival conference 
(ACM FAT*). But despite the volume and velocity of published work, our understanding of the 
fundamental questions related to fairness and machine learning remain in its infancy. What should 
fairness mean? What are the causes that introduce unfairness in machine learning? How best 
should we modify our algorithms to avoid unfairness? And what are the corresponding tradeoffs 
with which we must grapple? 

In March 2018, we convened a group of about fifty experts in Philadelphia, drawn from 
academia, industry, and government, to asses the state of our understanding of the fundamentals of 
the nascent science of fairness in machine learning, and to identify the unanswered questions that 
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seem the most pressing. By necessity, the aim of the workshop was not to comprehensively cover 
the vast growing field, much of which is empirical. Instead, the focus was on theoretical work aimed 
at providing a scientific foundation for understanding algorithmic bias. This document captures 
several of the key ideas and directions discussed. 


2 What We Know 


2.1 Causes of Unfairness 


Even before we precisely specify what we mean by “fairness”, we can identify common distortions 
that can lead off-the-shelf machine learning techniques to produce behavior that is intuitively unfair. 
These include: 


1. Bias Encoded in Data: Often, the training data that we have on hand already includes 
human biases. For example, in the problem of recidivism prediction used to inform bail and 
parole decisions, the goal is to predict whether an inmate, if released, will go on to commit 
another crime within a fixed period of time. But we do not have data on who commits crimes 
— we have data on who is arrested. There is reason to believe that arrest data — especially 
for drug crimes — is skewed towards minority populations that are policed at a higher rate 
[Rot14]. Of course, machine learning techniques are designed to fit the data, and so will 
naturally replicate any bias already present in the data. There is no reason to expect them 
to remove existing bias. 


2. Minimizing Average Error Fits Majority Populations: Different populations of people 
have different distributions over features, and those features have different relationships to 
the label that we are trying to predict. As an example, consider the task of predicting 
college performance based on high school data. Suppose there is a majority population and 
a minority population. The majority population employs SAT tutors and takes the exam 
multiple times, reporting only the highest score. The minority population does not. We 
should naturally expect both that SAT scores are higher amongst the majority population, 
and that their relationship to college performance is differently calibrated compared to the 
minority population. But if we train a group-blind classifier to minimize overall error, if it 
cannot simultaneously fit both populations optimally, it will fit the majority population. This 
is because — simply by virtue of their numbers — the fit to the majority population is more 
important to overall error than the fit to the minority population. This leads to a different 
(and higher) distribution of errors in the minority population. This effect can be quantified, 
and can be partially alleviated via concerted data gathering efforts [CJS18}. 


3. The Need to Explore: In many important problems, including recidivism prediction and 
drug trials, the data fed into the prediction algorithm depends on the actions that algorithm 
has taken in the past. We only observe whether an inmate will recidivate if we release him. 
We only observe the efficacy of a drug on patients to whom it is assigned. Learning theory tells 
us that in order to effectively learn in such scenarios, we need to explore — i.e. sometimes 
take actions we believe to be sub-optimal in order to gather more data. This leads to at 
least two distinct ethical questions. First, when are the individual costs of exploration borne 
disproportionately by a certain sub-population? Second, if in certain (e.g. medical) scenarios, 


we view it as immoral to take actions we believe to be sub-optimal for any particular patient, 
how much does this slow learning, and does this lead to other sorts of unfairness? 


2.2 Definitions of Fairness 


With a few exceptions, the vast majority of work to date on fairness in machine learning has focused 
on the task of batch classification. At a high level, this literature has focused on two main families 
of definitiond!} statistical notions of fairness and individual notions of fairness. We briefly review 
what is known about these approaches to fairness, their advantages, and their shortcomings. 


2.2.1 Statistical Definitions of Fairness 


Most of the literature on fair classification focuses on statistical definitions of fairness. This 
family of definitions fixes a small number of protected demographic groups G (such as racial 
groups), and then ask for (approximate) parity of some statistical measure across all of these 
groups. Popular measures include raw positive classification rate, considered in work such as 
(also sometimes known as statistical parity [DHP*19}), false 
positive and false negative rates (also sometimes known as 
equalized odds [HPS16]), and positive predictive value (closely related to equal- 
ized calibration when working with real valued risk scores). There are others — see e.g. 
for a more exhaustive enumeration. This family of fairness definitions is attractive because it is 
simple, and definitions from this family can be achieved without making any assumptions on the 
data and can be easily verified. However, statistical definitions of fairness do not on their own give 
meaningful guarantees to individuals or structured subgroups of the protected demographic groups. 
Instead they give guarantees to “average” members of the protected groups. (See for a 
litany of ways in which statistical parity and similar notions can fail to provide meaningful guar- 
antees, and for examples of how some of these weaknesses carry over to definitions 
which equalize false positive and negative rates.) Different statistical measures of fairness can be 
at odds with one another. For example, |Chol7| and prove a fundamental impossibility 
result: except in trivial settings, it is impossible to simultaneously equalize false positive rates, false 
negative rates, and positive predictive value across protected groups. Learning subject to statistical 
fairness constraints can also be computationally hard |WGOS17), although practical algorithms of 


various sorts are known JHPS16} [ZVGG17 |ABD* 18]. 


2.2.2 Individual Definitions of Fairness 


Individual notions of fairness, on the other hand, ask for constraints that bind on specific pairs 
of individuals, rather than on a quantity that is averaged over groups. For example, 
give a definition which roughly corresponds to the constraint that “similar individuals should be 
treated similarly”, where similarity is defined with respect to a task-specific metric that must be 
determined on a case by case basis. suggest a definition which roughly corresponds to 
“less qualified individuals should not be favored over more qualified individuals”, where quality is 
defined with respect to the true underlying label (unknown to the algorithm). However, although 


'There is also an emerging line of work that considers causal notions of fairness (see e.g., |KCP*17| [KLRS17 
NS18]). We intentionally avoided discussions of this potentially important direction because it will be the subject of 
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the semantics of these kinds of definitions can be more meaningful than statistical approaches to 
fairness, the major stumbling block is that they seem to require making significant assumptions. 
For example, the approach of pre-supposes the existence of an agreed upon similarity 
metric, whose definition would itself seemingly require solving a non-trivial problem in fairness, 
and the approach of seems to require strong assumptions on the functional form of the 
relationship between features and labels in order to be usefully put into practice. These obstacle are 
serious enough that it remains unclear whether individual notions of fairness can be made practical 
— although attempting to bridge this gap is an important and ongoing research agenda. 


3 Questions at the Research Frontier 


3.1 Between Statistical and Individual Fairness 


Given the limitations of extant notions of fairness, is there a way to get some of the “best of 
both worlds”? In other words, constraints that are practically implementable without the need 
for making strong assumptions on the data or the knowledge of the algorithm designer, but which 
nevertheless provide more meaningful guarantees to individuals? Two recent papers, |KNRW18b 
and (see also for empirical evaluations of the algorithms proposed 
in these papers), attempt to do this by asking for statistical fairness definitions to hold not just on 
a small number of protected groups, but on an exponential or infinite class of groups defined by 
some class of functions of bounded complexity. This approach seems promising: because ultimately 
they are asking for statistical notions of fairness, the approaches proposed by these papers enjoy 
the benefits of statistical fairness: that no assumptions need be made about the data, nor is 
any external knowledge (like a fairness metric) needed. It also better addresses concerns about 
“intersectionality”, a term used to describe how different kinds of discrimination can compound 
and interact for individuals who fall at the intersection of several protected classes. 

At the same time, the approach raises a number of additional questions: what function classes 
are reasonable, and once one is decided upon (e.g. conjunctions of protected attribures) what 
features should be “protected”? Should these only be attributes that are sensitive on their own, 
like race and gender, or might attributes that are innocuous on their own correspond to groups we 
wish to protect once we consider their intersection with protected attributes (for example clothing 
styles intersected with race or gender)? Finally, this family of approaches significantly mitigates 
some of the weaknesses of statistical notions of fairness by asking for the constraints to hold on 
average not just over a small number of coarsely defined groups, but over very finely defined groups 
as well. Ultimately, however, it inherits the weaknesses of statistical fairness as well, just on a more 
limited scale. 

Another recent line of work aims to weaken the strongest assumption needed for the notion of 
individual fairness from [DHP* 12): namely that the algorithm designer has perfect knowledge of 
a “fairness metric”. assume that the algorithm has access to an oracle which can return 
an unbiased estimator for the distance between two randomly drawn individuals according to an 
unknown fairness metric, and show how to use this to ensure a statistical notion of fairness related 
to which informally states that “on average, individuals in two groups 
should be treated similarly if on average the individuals in the two groups are similar” — and 
this can be achieved with respect to an exponentially or infinitely large set of groups. Similarly, 
assumes the existence of an oracle which can identify fairness violations when they are 


made in an online setting, but cannot quantify the extent of the violation (with respect to the 
unknown metric). It is shown that when the metric is from a specific learnable family, this kind 
of feedback is sufficient to obtain an optimal regret bound to the best fair classifier while having 
only a bounded number of violations of the fairness metric. [RY18| consider the case in which the 
metric is known, and show that a PAC-inspired approximate variant of metric fairness generalizes 
to new data drawn from the same underlying distribution. Ultimately, however, these approaches 
all assume that fairness is perfectly defined with respect to some metric, and that there is some sort 
of direct access to it. Can these approaches be generalized to a more “agnostic” setting, in which 
fairness feedback is given by human beings who may not be responding in a way that is consistent 
with any metric? 


3.2 Data Evolution and Dynamics of Fairness 


The vast majority of work in computer science on algorithmic fairness has focused on one-shot 
classification tasks. But real algorithmic systems consist of many different components that are 
combined together, and operate in complex environments that are dynamically changing, some- 
times because of the actions of the learning algorithm itself. For the field to progress, we need to 
understand the dynamics of fairness in more complex systems. 

Perhaps the simplest aspect of dynamics that remains poorly understood is how and when 
components that may individually satisfy notions of fairness compose into larger constructs that 
still satisfy fairness guarantees. For example, if the bidders in an advertising auction individually 
are fair with respect to their bidding decisions, when will the allocation of advertisements be 
“fair”, and when will it not? and have made a preliminary foray in this direction. 
These papers embark on a systematic study of fairness under composition, and find that often the 
composition of multiple fair components will not satisfy any fairness constraint at all. Similarly, 
the individual components of a “fair” system may appear to be unfair in isolation. There are 
certain special settings, e.g. the “filtering pipeline” scenario of — modeling a scenario 
in which a job applicant is selected only if she is selected at every stage of the pipeline — in which 
(multiplicative approximations of) statistical fairness notions compose in a well behaved way. But 
the high level message from these works is that our current notions of fairness compose poorly. 
Experience from differential privacy suggests that graceful degradation under 
composition is key to designing complicated algorithms satisfying desirable statistical properties, 
because it allows algorithm design and analysis to be modular. Thus, it seems important to find 
satisfying fairness definitions and richer frameworks that behave well under composition. 

In dealing with socio-technical systems, it is also important to understand how algorithms 
dynamically effect their environment, and the incentives of human actors. For example, if the 
bar (for e.g. college admission) is lowered for a group of individuals, this might increase the 
average qualifications for this group over time because of at least two effects: a larger proportion 
of children in the next generation grow up in households with college educated parents (and the 
opportunities this provides), and the fact that a college education is achievable can incentivize 
effort to prepare academically. These kinds of effects are not considered when considering either 
statistical or individual notions of fairness in one-shot learning settings. The economics literature 
on affirmative action has long considered such effects — although not with the specifics of machine 
learning in mind: see e.g. [Bec10]. More recently, there have been some preliminary 
attempts to model these kinds of effects in machine learning settings — e.g. by modeling the 
environment as a markov decision process |JJK~*17|, considering the equilibrium effects of imposing 


statistical definitions of fairness in a model of a labor market [HC18], specifying the functional 
relationship between classification outcomes and quality [LDR+18}, or by considering the effect of 
a classifier on a downstream Bayesian decision maker [KRZ18]. However, the specific predictions 
of most of the models of this sort are brittle to the specific modeling assumptions made — they 
point to the need to consider long term dynamics, but do not provide robust guidance for how to 
navigate them. More work is needed here. 

Finally, decision making is often distributed between a large number of actors who share different 
goals and do not necessarily coordinate. In settings like this, in which we do not have direct control 
over the decision making process, it is important to think about how to incentivize rational agents 
to behave in a way that we view as fair. takes a preliminary stab at this task, showing 
how to incentivize a particular notion of individual fairness in a simple, stylized setting, using small 
monetary payments. But how should this work for other notions of fairness, and in more complex 
settings? Can this be done by controlling the flow of information, rather than by making monetary 
payments (monetary payments might be distasteful in various fairness-relevant settings)? More 
work is needed here as well. Finally, take a welfare maximization view of fairness in 
classification, and characterize the cost of imposing additional statistical fairness constraints as 
well. But this is done in a static environment. How would the conclusions change under a dynamic 
model? 


3.3 Modeling and Correcting Bias in the Data 


Fairness concerns typically surface precisely in settings where the available training data is already 
contaminated by bias. The data itself is often a product of social and historical process that op- 
erated to the disadvantage of certain groups. When trained in such data, off-the-shelf machine 
learning techniques may reproduce, reinforce, and potentially exacerbate existing biases. Under- 
standing how bias arises in the data, and how to correct for it, are fundamental challenges in the 
study of fairness in machine learning. 

demonstrate how machine learning can reproduce biases in their analysis of the pop- 
ular word2vec embedding trained on a corpus of Google News texts (parallel effects were indepen- 
dently discovered by [CBN17]). The authors show that the trained embedding exhibit female/male 
gender stereotypes, learning that “doctor” is more similar to man than to woman, along with 
analogies such as “man is to computer programmer as woman is to homemaker”. Even if such 
learned associations accurately reflect patterns in the source text corpus, their use in automated 
systems may exacerbate existing biases. For instance, it might result in male applicants being 
ranked more highly than equally qualified female applicants in queries related to jobs that the 
embedding identifies as male-associated. 

Similar risks arise whenever there is potential for feedback loops. These are situations where 
the trained machine learning model informs decisions that then affect the data collected for future 
iterations of the training process. demonstrate how feedback loops might arise in predictive 
policing if arrest data were used to train the mode. In a nutshell, since police are likely to 
make more arrests in more heavily policed areas, using arrest data to predict crime hotspots will 
disproportionately concentrate policing efforts on already over-policed communities. Expanding 
on this analysis, find that incorporating community-driven data such as crime reporting 


Predictive policing models are generally proprietary, and so it is not clear whether arrest data is used to train 
the model in any deployed system. 


helps to attenuate the biasing feedback effects. The authors also propose a strategy for accounting 
for feedback by adjusting arrest counts for policing intensity. The success of the mitigation strategy 
of course depends on how well the simple theoretical model reflects the true relationships between 
crime intensity, policing, and arrests. Problematically, such relationships are often unknown, and 
are very difficult to infer from data. This situation is by no means specific to predictive policing. 

Correcting for data bias generally seems to require knowledge of how the measurement process 
is biased, or judgments about properties the data would satisfy in an “unbiased” world. 
formalize this as a disconnect between the observed space—features that are observed in the data, 
such as SAT scores—and the unobservable construct space—features that form the desired basis 
for decision making, such as intelligence. Within this framework, data correction efforts attempt 
to undo the effects of biasing mechanisms that drive discrepancies between these spaces. To the 
extent that the biasing mechanism cannot be inferred empirically, any correction effort must make 
explicit its underlying assumptions about this mechanism. What precisely is being assumed about 
the construct space? When can the mapping between the construct space and the observed space 
be learned and inverted? What form of fairness does the correction promote, and at what cost? 
The costs are often immediately realized, whereas the benefits are less tangible. We will directly 
observe reductions in prediction accuracy, but any gains hinge on a belief that the observed world 
is not one we should seek to replicate accurately in the first place. This is an area where tools from 
causality may offer a principled approach for drawing valid inference with respect to unobserved 
counterfactually ‘fair’ worlds. 


3.4 Fair Representations 


Fair representation learning is a data de-biasing process that produces transformations (interme- 
diate representations) of the original data that retain as much of the task-relevant information as 
possible while removing information about sensitive or protected attributes. This is one approach 
to transforming biased observational data in which group membership may be inferred from other 
features, to a construct space where protected attributes are statistically independent of other fea- 
tures. First introduced in the work of [ZWS*13], fair representation learning produces a de-biased 
data set that may in principle be used by other parties without any risk of disparate outcomes. 
FFM*15| and formalize this idea by showing how the disparate impact of a decision 
rule is bounded in terms of its balanced error rate as a predictor of the sensitive attribute. 
Several recent papers have introduced new approaches for constructing fair representations. 
FFM 15] propose rank-preserving procedures for repairing features to reduce or remove pairwise 
dependence with the protected attribute. build upon this work, introducing a likelihood- 
based approach that can additionally handle continuous protected attributes, discrete features, and 
which promotes joint independence between the transformed features and the protected attributes. 
There is also a growing literature on using adversarial learning to achieve group fairness in the form 
of statistical parity or false positive/false negative rate balance IMCPZ18]. 
Existing theory shows that the fairness promoting benefits of fair representation learning rely 
critically on the extent to which existing associations between the transformed features and the pro- 
tected characteristics are removed. Adversarial downstream users may be able to recover protected 
attribute information if their models are more powerful than those used initially to obfuscate the 
data. This presents a challenge both to the generators of fair representations as well as to auditors 
and regulators tasked with certifying that the resulting data is fair for use. More work is needed to 
understand the implications of fair representation learning for promoting fairness in the real world. 


3.5 Beyond Classification 


Although the majority of the work on fairness in machine learning focuses on batch classification, 
batch classification is only one aspect of how machine learning is used. Much of machine learning 
—e.g. online learning, bandit learning, and reinforcement learning — focuses on dynamic settings 
in which the actions of the algorithm feed back into the data it observes. These dynamic settings 
capture many problems for which fairness is a concern. For example, lending, criminal recidivism 
prediction, and sequential drug trials all are so-called bandit learning problems, in which the algo- 
rithm cannot observe data corresponding to counterfactuals. We cannot see whether someone not 
granted a loan would have paid it back. We cannot see whether an inmate not released on parole 
would have gone on to commit another crime. We cannot see how a patient would have responded 
to a different drug. 

The theory of learning in bandit settings is well understood, and it is characterized by a need 
to trade off exploration with exploitation. Rather than always making a myopically optimal deci- 
sion, when counter-factuals cannot be observed, it is necessary for algorithms to sometimes take 
actions that appear to be sub-optimal so as to gather more data. But in settings in which deci- 
sions correspond to individuals, this means sacrificing the well-being of a particular person for the 
potential benefit of future individuals. This can sometimes be unethical, and a source of unfairness 
[BBCT16}. Several recent papers explore this issue. For example, and give 
conditions under which linear learners need not explore at all in bandit settings, thereby allowing 
for best-effort service to each arriving individual, obviating the tension between ethical treatment 
of individuals and learning. |RSVW418] show that the costs associated with exploration can be un- 
fairly bourn by a structured sub-population, and that counter-intuitively, those costs can actually 
increase when they are included with a majority population, even though more data increases the 
rate of learning overall. However, these results are all preliminary: they are restricted to settings 
in which the learner is learning a linear policy, and the data really is governed by a linear model. 
While illustrative, more work is needed to understand real-world learning in online settings, and 
the ethics of exploration. 

There is also some work on fairness in machine learning in other settings — for example, rank- 
ing ICSV17], selection IKR18}, personalization [CV17], bandit learning JKMT 18) 
LRD+17|, human-classifier hybrid decision systems [MPZ17], and reinforcement learning |JJK*17| 
DTB17|. But outside of classification, the literature is relatively sparse. This should be rectified, 
because there are interesting and important fairness issues that arise in other settings — especially 
when there are combinatorial constraints on the set of individuals that can be selected for a task, 
or when there is a temporal aspect to learning. 
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